Finding All Tandem Arrays in DNA Sequences
نویسندگان
چکیده
A tandem array is a substring of the form k x , where x is any unspecific substring and k is at least two (when k is 2, 2 x is also called a tandem repeat or square). A non-extendable tandem array occurring in string S is a tandem array k x which are not followed or preceded by another occurrence of x in S. The problem of this thesis is defined as follows: Given a string S of length n, find all non-extendable tandem arrays of S. In the thesis, we present an O(nlogn) algorithm for the problem. The algorithm is based on a variation of an O(nlogn) algorithm for finding all squares in S. Instead of directly finding all non-extendable tandem arrays, we first find all squares and then use them to construct all non-extendable p-periodic substrings whose definition follows: A non-extendable p-periodic substring is a substring Y with maximal length m such that the ith character is equal to the ( ) p i + th character of Y for all { } p m 1 i − ∈ , ,K and ⎣ ⎦ 2 p m ≥ / . With all non-extendable p-periodic substrings found, we can easily determine all non-extendable tandem arrays at once.
منابع مشابه
Evolution of repeated sequence arrays in the D-loop region of bat mitochondrial DNA.
Analysis of mitochondrial DNA control region sequences from 41 species of bats representing 11 families revealed that repeated sequence arrays near the tRNA-Pro gene are present in all vespertilionine bats. Across 18 species tandem repeats varied in size from 78 to 85 bp and contained two to nine repeats. Heteroplasmy ranged from 15% to 63%. Fewer repeats among heteroplasmic than homoplasmic in...
متن کاملA new moderately repetitive DNA sequence family of novel organization.
In cloning adenovirus homologous sequences, from a human cosmid library, we identified a moderately repetitive DNA sequence family consisting of tandem arrays of 2.5 kb members. A member was sequenced and several non-adjacent, 15-20 bp G-C rich segments with homology to the left side of adenovirus were discovered. The copy number of 400 members is highly conserved among humans. Southern blots o...
متن کاملSatellite DNAs between selfishness and functionality: structure, genomics and evolution of tandem repeats in centromeric (hetero)chromatin.
Satellite DNAs (tandemly repeated, non-coding DNA sequences) stretch over almost all native centromeres and surrounding pericentromeric heterochromatin. Once considered as inert by-products of genome dynamics in heterochromatic regions, recent studies showed that satellite DNA evolution is interplay of stochastic events and selective pressure. This points to a functional significance of satelli...
متن کاملEvolutionary dynamics of tandem repeats in the mitochondrial DNA control region of the minnow Cyprinella spiloptera.
Length variation due to tandem repeats is now recognized as a common feature of animal mitochondrial DNA; however, the evolutionary dynamics of repeated sequences are not well understood. Using phylogenetic analysis, predictions of three models of repeat evolution were tested for arrays of 260-bp repeats in the cyprinid fish Cyprinella spiloptera. Variation at different nucleotide positions in ...
متن کاملPlant chromosomes from end to end: telomeres, heterochromatin and centromeres.
Recent evidence indicates that heterochromatin in plants is composed of heterogeneous sequences, which are usually composed of transposable elements or tandem repeat arrays. These arrays are associated with chromatin modifications that produce a closed configuration that limits transcription. Centromere sequences in plants are usually composed of tandem repeat arrays that are homogenized across...
متن کامل